Properties and use of CMB power spectrum likelihoods 
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Fast robust methods for calculating likelihoods from CMB observations on small scales generally 
rely on approximations based on a set of power spectrum estimators and their covariances. We 
investigate the optimality of these approximation, how accurate the covariance needs to be, and 
how to estimate the covariance from simulations. For a simple case with azimuthal symmetry 
we compare optimality of hybrid pseudo-C; CMB power spectrum estimators with the exact result, 
indicating that the loss of information is not negligible, but neither is it enough to have a large effect 
on standard parameter constraints. We then discuss the number of samples required to estimate the 
covariance from simulations, with and without a good analytic approximation, and assess the use 
of shrinkage estimators. Finally we discuss how to combine an approximate high-/ likelihood with a 
more exact low-/ harmonic-space likelihood as a practical method for accurate likelihood calculation 
on all scales. 



I. INTRODUCTION 

The Cosmic Microwave Background (CMB) appears to 
be isotropic and Gaussian to a good approximation, and 
hence allows robust statistical constraints to be placed on 
a variety of cosmological parameters. However high reso- 
lution observations such as those from the Planck satellite 
produce sky maps with many millions of pixels, for which 
performing an exact likelihood analysis becomes numeri- 
cally very difficult. Most data analyses therefore use fast 
robust approximations based on power spectrum estima- 
tors [0 |j. While these are expected to be suboptimal 
at some level, they should be unbiased on average. The 
main advantage of a fast method is that multiple simula- 
tions can be performed to assess in detail the propagation 
of errors from numerous instrumental processes into the 
final results, and hence in practice may also be signifi- 
cantly more reliable than an in-principle more optimal 
method. 

In a previous paper we investigated in detail various 
approximations for calculating cosmological parameter 
likelihoods from high-resolution CMB power spectrum 
estimators, and showed that indeed good approximations 
can be found that produce unbiased results [§. These 
approximations effectively transform a set of power spec- 
trum estimators so the likelihood of a theoretical power 
spectrum can be written in a Gaussian form, then evalu- 
ate this Gaussian function using an estimate of the power 
spectrum estimator covariance. In simple cases this co- 
variance can be calculated to reasonable accuracy using 
analytic approximations, though ideally it should be as- 
sessed by performing large numbers of full data-analysis 
simulations, fully accounting for noise, sky cuts, noise 
correlations, beam effect, map-making errors, etc. Addi- 
tional complications may additionally be accounted for 
by modifying the theory power spectrum, for example 



due to beam or foreground uncertainty modes. 

In this paper we aim to assess how much information 
is being lost by using a fast pseudo-C; method compared 
to a more optimal method (e.g. maximum likelihood or 
Gibbs sampling Q), and whether the increase in error 
bars from using a suboptimal method has a significant 
effect on cosmological parameters. We then assess how to 
estimate the covariance from simulations, and how many 
simulations are required under various assumptions. We 
show how information from approximate models can be 
combined with simulations by using shrinkage estimators 
or by fitting model parameters. Finally we suggest how 
an approximate power spectrum likelihood at high / can 
be combined with a more exact low-Z likelihood for evalu- 
ating the total likelihood, consistently accounting for the 
contribution of high I power to the low-Z mode variance. 

We focus on nearly full-sky observations, for example 
from the WMAP or Planck satellites, where only a small 
fraction 15%) of the sky is cut out due to foreground 
or point-source contamination. For simplicity we discuss 
mainly the CMB temperature, which is what drives the 
parameter constraints, though in the appendix we give a 
general harmonic method for calculating the low-/ likeli- 
hood in harmonic space. Since the paper is only likely 
to be of interest to experts in the field, we refer to the 
extensive literature for introductory material. 



II. HOW OPTIMAL ARE PSEUDO-C; 
LIKELIHOODS? 

We focus on hybrid pseudo-C; power spectrum estima- 
tors, constructed by combining sets of pseudo-C/ estima- 
tors from maps with different weightings. For details of 
how the pseudo-C; estimators are constructed, and how 
the covariance can be estimated, see Refs. ^ ^, 0, ||, 
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^, |T^, 12, When the sky is noise dominated, the 
pseudo-C/ estimator with inverse-noise weighting is opti- 
mal; when the noise is negligible a uniform weighting is 
optimal. Combining different pseudo-C; estimators with 
different weightings gives hybrid estimators that interpo- 
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late smoothly between the regimes, giving significantly 
smaller error bars at all / than a single weight function 
could §. 

Over the years various code comparisons and tests have 
indicated that pseudo-C; estimator are in fact rather 
good at high I, in that their variance is typically within 
0(10%) of that expected for a maximum likelihood es- 
timator (e.g. Refs. [0, |l2|). However the degree of sub- 
optimality will depend on the noise and sky cut under 
consideration. Due to its scanning strategy, the Planck 
satellite will have highly anisotropic noise, and we would 
like to assess whether this is likely to be a problem for 
pseudo-C; methods. Here we perform a comparison in the 
case of a strongly anisotropic noise, but with azimuthal 
symmetry so that the optimal Fisher errors can be cal- 
culated exactly for comparison. Though this situation 
is clearly unrealistic, it should give a good idea of the 
amount of suboptimality that can be expected in prac- 
tice. Our main concern is the anisotropy of the noise, 
so we shall consider the full sky; the main advantage 
of doing this is that the covariance of the pseudo-C; es- 
timators can then be calculated accurately analytically, 
so that our comparison is assessing the information loss 
due to sub-optimal handling of noise anisotropy, rather 
than depending on errors in the covariance calculation. 
We discuss the effects of covariance errors in a later sec- 
tion. We focus on the high-Z temperature power spectrum 
since this dominates the parameter constraints from the 
Planck satellite; for discussion of how to improve polar- 
ization pseudo-Ci estimators see Refs. 
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A. Comparing the Fisher matrix and the 
pseudo-C;'s variance in the full sky 

The Cramer-Rao inequality states that the inverse of 
the Fisher information is a lower bound on the variance 
of any unbiased estimator, so we use the Fisher matrix to 
quantify the errors on the power spectrum that one could 
hope to get using an optimal method. For a Gaussian 
likelihood function £, the Fisher matrix, F, for the power 
spectrum Ci at some fiducial model is given by (see e.g. 
Ref. dl) 
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where C = 5 + AT, with S and TV being the signal 
and noise covariances, respectively. For a statistically 
isotropic signal on the full sky S is diagonal in harmonic 
space, and for pixel-uncorrelated noise N can be cal- 
culated easily in terms of the pixel noise variance. We 
make the assumption of azimuthal symmetry so that 
Ni'm'im = Smm'Niimim, which makcs the Fisher matrix 
numerically tractable by putting it into block diagonal 
form (with blocks for each m). The Fisher matrix then 



Fw = 
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which is simply half the square of the inverse of the co- 
variance matrix (signal plus noise) summed over m. 

We compare this with the errors expected using 
pseudo-C/ power spectrum estimators. For weight func- 
tions labelled by i and j there are a set of pseudo-Cj 
and a covariance 



estimators Cj'' 
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Explicit expressions for the covariances are given in e.g. 
Ref. 1^ . The hybrid pseudo-C/ estimators of Ref. ||] 
are a combination of pseudo-C; 's with different combina- 
tions of weights. In the low noise regime, the best weight 
function, uj^fl) is close to uniform in order to minimize 
the cosmic variance. However, in the high noise regime, 
the weight function is proportional to the inverse-noise 
to reduce noise itself Therefore, a natural choice 
would be to choose a combination of results from uniform 
and inverse-noise weight functions. Here we consider the 
simplest case of combining the two estimators calculated 
separately from the two weighted maps. The full hybrid 
covariance matrix then reads: 



(AQ'^AC/; 




(4) 



where H^,^ is the inverse of the covariance matrix 
(AC^ACP), with A and B denoting the different weight 
functions used for the C/ contributing to the hybrid esti- 
mator. 

We can now proceed to compare the errors on indi- 
vidual Ci and their correlations. However for parameter 
estimation we are not just interested in individual /: the 
Ci are smooth functions of I, and the effects of param- 
eters enter in a smooth way over a range of I. We are 
therefore more interested in the constraint on the ampli- 
tude over some smoothing scale, as this will more closely 
determine how well we can constrain different parame- 
ters. The constraint over a range of scales depends on 
the correlations of different individual I, so considering a 
range of I also has the advantage of checking any effects 
due to correlations. 

We therefore also evaluate the error of an amplitude 
parameter. A, over some range in Al, so that the power 
spectrum over the range Al is given by AC™, and else- 
where by C;", for some fiducial model Cj",. The Fisher 
matrix for amplitude A over Al is then 



/max /max 
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1 if / and I' in (Al) 
otherwise 
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FIG. 1: A comparison between the Fisher errors and the in- 
verse of the hybrid estimator variance for the different weight- 
ing combinations considered. The red (solid), blue (dotted) 
and green (dashed) curves represent the ratios of Fisher errors 
to the inverse variance of uniform noise weighting, inverse- 
variance noise weighting and hybrid, respectively. 



FIG. 2; A comparison of the errors in the power over a range 
a; = 2 (solid line), Al = 10 (dashed line) and AZ = 50 
(dot-dashed line), for bands starting at Imin- The red, blue 
and green curves represent the ratios of Fisher errors to the 
inverse variance of uniform noise weighting, inverse-variance 
noise weighting and hybrid, respectively. 



This can be compared to using the pseudo-C/ estima- 
tors in a Gaussian likelihood approximation: 

-2ln£f{Ci) = ^(Q - Ci)[My%,{Ci, ~ Q,), (6) 
II' 

where [Mf] is the covariance for a fiducial model. Dif- 
ferentiating twice w.r.t A gives: 

- ^{InCf) ^Y.^r[Mi'WCr, (7) 
w 

where the sum is over the I in the bin being considered. 
Note that we are considering the case of generating esti- 
mators at each Z, and then using the covariance to esti- 
mate the power over a range of I. This is not the same as 
making a binned Ci estimator (depending on some win- 
dow function range of I), which may well have different 
properties. Full sky observations such as WMAP and 
Planck are normally analysed into individual Ci estima- 
tors, the case we consider, though there may be advan- 
tages to also considering binned estimators. 

To make an error comparison, we use an azimuthally 
averaged (in ecliptic coordinates) version of the noise 
expected for the Planck satellite and consider an 
isotropic 7arcmin-fwhm Gaussian beam. 

We compare the inverse of the hybrid variance (the 
diagonal elements of Eq. (^) to the Fisher errors (the 



diagonals of Eq. (||)) in Fig. |l|. The figure also shows 
the uniform and inverse-weighted estimators separately. 
Although the variance of the uniform noise weighting is 
almost identical to the fisher error at low Z, it deviates 
considerably when noise dominates. On the other hand, 
the variance of the inverse- variance noise weighting case 
remains poor even at high Z where noise is starting to 
dominate over the signal. The hybrid estimators signif- 
icantly improves the errors, giving results within a few 
percent of the ideal Fisher errors at all I. 

Fig. H compares the results for the amplitude over a 
range AZ = 2, 10, 50. In this case a larger range of I gives 
relatively worse hybrid errors compared to the Fisher er- 
rors, though the hybrid errors are still within 10% at all I. 
One interpretation of the effect of bin size might be that 
the optimal result is combining correlation information 
between different scales significantly more efficiently than 
the hybrid estimator. This is perhaps not surprising since 
the Fisher result 'knows' about correlations between all 
the I and m modes individually, whereas the hybrid ap- 
proximation has compressed the correlation information 
into a covariance matrix accounting only for I correla- 
tions. The hybrid estimator could be further improved 
by including more pseudo-C/ estimators in the mix if 
greater optimality is desired, for example the uniform- 
inverse weight estimator; however as we see below the 
level of optimality found here is likely to be sufficient in 
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FIG. 3: Parameter constraints from a single idealized Plank- 
like simulation with anisotropic noise. The 1-dimensional 
marginalized posteriors are from the fiducial Gaussian ap- 
proximation with the covariance being from the Fisher ma- 
trix or the pseudo-Ci estimator covariance. The black (solid) 
Une uses the Fisher covariance, blue (dotted) is the inverse- 
noise weighted estimator, green (dot-dashed) is with uniform 
weight, and red (dashed) uses the hybrid estimator. The re- 
sults are very consistent as constraints are driven by scales 
where the uniform weighting is nearly optimal. 



most cases. 



B. Effect on parameter estimation 

Since the hybrid pseudo-C; may be suboptimal at up 
to the 10% level, it is also useful to directly check the 
effect on parameter estimation. We use full-sky Planck- 
like simulations with azimuthal symmetry as previously 
described. We consider a standard 6-parameter ACDM 
model (with running of the spectral index, but the optical 
depth fixed since we exclude / < 30 where a more optimal 
likelihood analysis is possible). For simplicity and for a 
quick check, we consider the fiducial Gaussian likelihood 
approximation since it depends on a pre-computed co- 
variance matrix. We calculate the likelihood with the 
Fisher covariance (Eq. (||)) and with the hybrid covari- 
ance (Eq. (Q)) using the CosmoMC parameter estimation 
code to sample from the posterior parameter distribu- 
tion Q. We use a fiducial analytic model for the Ci, 
corresponding to averaging the log-likelihood over real- 
izations. 

Fig. H shows the results, which are very similar. This 
is not a surprise since Planck can measure many acous- 



tic peaks in the CMB power spectrum, so most param- 
eters are well constrained from the cosmic- variance lim- 
ited region where the hybrid estimator is close to opti- 
mal. The effect could be somewhat larger in extended 
parameter spaces where models differ significantly only 
over the region where the hybrid estimator is significantly 
sub-optimal. 



III. HOW TO CALCULATE THE 
COVARIANCE? 

In the simple cases we have considered above the es- 
timator covariance can be calculated accurately analyti- 
cally. In more realistic cases this is unlikely to be the case: 
even a simple sky cut renders the commonly-used covari- 
ance approximates accurate at only the 0(10%) level, de- 
pending on apodization (see e.g. [Q). Ideally we would 
like to be able to evaluate the covariance (in some fiducial 
model) directly from simulations of the full time stream, 
including all relevant instrumental effects. This is the 
approach adopted by the MASTER/xFASTER pipchnes 
used by many CMB observations with data over only a 
small part of the sky |l9| ^ ; the advantage of a Monte- 
Carlo approach is that numerous complicated effects can 
be included straightforwardly, where an analytic result 
would be intractable. It should however be remembered 
that if the likelihood approximation is derived to be good 
for near-isotropic-Gaussian fields, including systematics 
in the covariance may invalidate the likelihood approxi- 
mation. For example beam uncertainty modes effectively 
scale the theory power spectrum coherently over a large 
range of scales, so beam uncertainties may be better ac- 
counted for by including them as extra parameters af- 
fecting the theory C'l in the parameter estimation code, 
rather than as part of the likelihood function for beam- 
uncertain estimators |1^]. 

The maximum likelihood estimator of the covariance 
of a set of zero-mean samples of a Gaussian vector n is 

M = i^n,,nf, (8) 

i 

where the sum is over the n independent samples. If our 
only knowledge about the covariance comes from simula- 
tions, we should account for the sampling uncertainty in 
the true covariance by marginalizing over the probability 
distribution of the true covariance given the Monte-Carlo 
estimate; see appendix ^ where we give mathematical 
details in an idealized case. 

If there are a large number of samples, the uncertainty 
in the true covariance may be negligible, so the Monte- 
Carlo covariance can be used directly, but the question 
is: how many samples do you need? There is a basic 
lower limit of n > p, where p is the dimensionality of 
the covariance; this is required to have p independent 
modes sampled, and hence for the covariance estimate to 
be invertible. Already in the case of the power spectrum 
estimators this is a non-trivial requirement: for WMAP 
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temperature p ~ 10'^, but due to the very high numer- 
ical cost they were only able to do a dozen or so full 
time-stream simulations. For the distribution of the true 
covariance given the estimator to be normalizable we ac- 
tually need n > 2p, and for fractional accuracy better 
than a on the error bars, we require n > 2p/a. Indeed to 
have the numerical value of the accurate, one needs 
n ^ , which is getting to be very time consuming for 
un-binned C/ estimators, even for simplified map-level 
simulations. However in practice it is not usually re- 
quired to have the accurate, as long as the variation 
under changes in parameters is accurate, and hence the 
correct parameter constraints are obtained. For further 
discussion see Appendix |^ and Refs. ||2^, [2^ . 

In a realistic situation we may have some idea of the co- 
variance from approximate analytical results, but would 
also like to calibrate it from simulations to account for 
complications that are not included in the analytical 
model. An estimator can use both the approximation 
and the simulations. Perhaps the simplest case to con- 
sider is where there is a prior estimate that one considers 
to be about as accurate as that from q simulations. Then 
adding the information from n actual simulations, the 
best estimate of the true covariance will be the weighted 
sum of the two covariance estimates (see appendix ^). 
This is a simple example of a shrinkage estimator; it 
'shrinks' noisy simulated estimators towards some prior 
target, giving a new estimator that should be better than 
both individually. In the limit of many simulations the 
new estimator is simulation dominated; with few simula- 
tions it is prior dominated. 



of this weighted combination is to form a regularized es- 
timator that outperforms both estimators individually. 
The shrinkage intensity is chosen in a way that would 
optimize the estimator, for example by minimizing the 
mean-squared error: 



MSE(A) = {Y,{s\ 



(10) 



with the angle brackets being the expectation value. 
Ledoit and Wolf |^ derived a analytic solution for A 
to minimize this error: 



A* = 



X);[var(J;) -COv(t;,S;) -bias(5;)(t; - Si)] 
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(11) 



For a practical use of Eq. (p, l[), Ref. |2J suggest replac- 
ing the covariance, variance and bias with their unbiased 
samples estimates (cov, var, bias). If s is an unbiased es- 
timator, then the bias term may be omitted. The new 
expression of the shrinkage intensity then reads: 



A* = 



X];[var(s;) - c6v(t;, S;)] 



(12) 



In practice we use min(l. A*). 

The shrinkage method can be applied directly to esti- 
mate the covariance matrix when s is taken to be a vector 
of the distinct elements of the covariance matrix. If we 
have n measurements of a data vector x of size p, and 



we assume that xp'' is the fc*'' measurement of the I 
element of x, then the empirical mean of the variable xi 
is: 
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A. Shrinkage estimators 

Shrinkage estimators were originally introduced for the 
situation in which there are many fewer samples than 
numbers of dimensions, so that the maximum- likelihood 
covariance estimate is not even invertible. This is a com- 
mon situation in many problems, and shrinkage provides 
a method to regularize the estimate in a well-defined way 
so that it is invertible. Here we would ideally also like 
to have very few simulations, but we also require good 
accuracy of the answer, and may prefer to generate more 
samples than have an inaccurate answer. In this section 
we investigate whether shrinkage estimators are useful 
for CMB likehhoods. 

A shrinkage estimator s* for a quantity s is constructed 
from a linear combination of some 'target' t and an esti- 
mator s calculated from samples [pSl p3, p5|: 



s* = At + (1 - A)s, 



(9) 



with A, the shrinkage intensity, being in the range to 1. 
The target can either be a fixed prior, or it could some ap- 
proximate estimate from the data, or some combination. 
If A = then the shrinkage estimate equals the unre- 
stricted estimate, s. If, however, A = 1 then the target 
estimate, t, is recovered. Therefore, the main advantage 
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The unbiased empirical covariance, S is given by: 



Siv = cawixi.xv) 



n - 1 



where we have set: 



k=l 



w/,!=) = (xp)-x,)(4'^-x.). 
Similarly, the required variance can be estimated as 

var(^.') = 7;^E(W^r-W.O^. 



(13) 
(14) 

(15) 
(16) 

(17) 



k=l 



If the target is fixed (does not depend on the data) then 
the covariance term vanishes; if not, it should be included 
to account for the correlation. We estimate the shrinkage 
intensity by replacing s and t in Eq. (^ij) with S and T, 
where T is some arbitrary target matrix. Therefore, the 
optimized shrinkage covariance now takes the form: 



M = A*T-|- (1 - A*)S. 



(18) 
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The shrinkage estimator is designed to use information 
from the sample estimator roughly in proportion to how 
relatively accurate it is. If we have a target that is say 
10% accurate, S needs to have < 10% accuracy to signif- 
icantly improve on the target. For high precision results, 
the estimator needs to be accurate, so the shrinkage es- 
timator requires the same order of number of samples as 
the direct accurate estimator. It simply has the advan- 
tage of being slightly superior, and not breaking down 
rapidly as the number of samples is decreased. 

Note that it is not at all clear in the current context 
that minimizing the Frobenius norm of the error (Eq. [l0| ) 
is the best thing to do. Indeed if the diagonal and non- 
diagonal elements of the covariance have very different 
distributions, it is not a good idea to use a shrinkage es- 
timator derived by treating them on equal terms. One 
can however equally well apply a shrinkage estimator to 
subsets of parameters separately. Since the off-diagonal 
terms are generally small, the number of simulations re- 
quired to estimate them accurately is very large: for a 
given number of samples the shrinkage intensity A* will 
be significantly larger for the off-diagonal than the diag- 
onal components. Also note that the shrinkage intensity 
A* is not independent of the scaling of the elements (e.g. 
using Ci or 1(1 + l)Ci gives different results). We there- 
fore apply the estimator to the covariance matrix after 
taking out the approximate CjCj' scaling of the elements 
(e.g. normalizing so that the target has unit diagonal). 

The effect of the shrinkage estimator on estimated like- 
lihoods is illustrated in Fig. ^ where separate shrinkage 
estimates are used for the diagonal and off-diagonal parts 
of the covariance matrix. Here toy samples were gener- 
ated from Gaussian model with a known true covariance, 
and the shrinkage target is diagonal and 30% wrong. The 
shrinkage estimator is significantly closer to the true re- 
sult than using either the direct sample estimate or the 
diagonal target, as expected. Using a purely diagonal 
covariance approximation with the variances estimated 
from the samples would give a similar result. As more 
samples are used the shrinkage estimator gradually in- 
cludes more information about the off-diagonal correla- 
tions from the simulations. 

The division of the covariance into components with 
different properties can be further generalized. Cut-sky 
effects tend mostly to correlate nearby but the ma- 
trix is smooth in Z, so for example is similar to 
M(;+i)(;+2) (see Fig. ||). This suggests slicing the matrix 
into diagonal strips: Mu (for all I), M;(;+i), Afi(i+2), etc, 
where each strip is given its own shrinkage intensity. In 
practice Af;(;+„) is very small for n 3> 1 in most cases 
(the matrix is nearly band diagonal), so that the com- 
bined shrinkage estimator effectively sets n 3> 1 elements 
to the target. Using a shrinkage estimator does however 
allow the possibility of the estimator including strongly 
off-diagonal correlations if they appear to be needed by 
the samples. If the target is nearly zero for n ^ 1, then 
all the n ^ 1 elements could be lumped together with 
a single shrinkage intensity, reducing noise in the inten- 
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FIG. 4: Example log likelihood for the power spectrum am- 
plitude over 100 < I < 700 in a typical realization when 
using the exact covariance (thick solid), a prior guess diag- 
onal target that is off by a factor of 1.3 from the true covari- 
ance (dot-dashed), the covariance estimated from 2000 sam- 
ples (dashed) , and the shrinkage estimator that combines the 
estimated and target covariances to get closer to the correct 
result (thin solid). The diagonal and off-diagonal parts of the 
matrix are shrunk separately, giving a shrinkage estimator 
close to the sample estimate on the diagonal and close to the 
target on the off-diagonal. We approximate the likelihood as 
Gaussian in the Ci . 



sity due to the small size of the strips in the corners of 
the matrix. The strips with n ~ 1 will gradually change 
from the target to the sample estimator as the number 
of samples is increased. 

Fig. H compares the covariance estimated from 23000 
realistic simulations of the WMAP 5-year temperature 
maps jTsf with an analytic approximation for off-diagonal 
parts of the covariance. The analytic result agrees rather 
well, indicating that in this case an extremely large num- 
ber of simulations would be required to improve the off- 
diagonal result significantly from simulations. On the 
other hand if there was a disagreement between the sim- 
ulations and analytic result, a shrinkage estimator would 
account for this and start to correct the result. On the 
much larger diagonal analytic approximations are only 
accurate to 0(5%) (see Fig. |^), so a relatively modest 
number of simulations can improve on the analytic re- 
sult. 

Fig. 1^ shows how the shrinkage intensity varies with the 
number of simulations, both for the case of a significantly 
wrong target, and the example of WMAP where we have 
an analytic approximation accurate at the 5%-level. The 
better the target is the more simulations are required to 
improve on it from the simulations: with a significantly 
wrong target the simulation estimator rapidly dominates 
the shrinkage estimator (A* 0). With a good ana- 
lytic result the diagonal estimate is gradually improved 
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FIG. 5: Diagonal strips of the maximum-likelihood correla- 
tion matrix estimated from 23000 simulations of the WMAP 
5-year temperature maps [Q, {1,1 + 1) [thin solid], {1,1 + 2) 
[thick solid] , (/,/ + 3) [thin dashed] and {1,1 + 4) [thick dashed] ; 
the elements become very small when further away from the 
diagonal. The top figure is the raw estimate, the bottom fig- 
ure shows the result smoothed with a Gaussian kernel of width 
ai = 10. The {1,1 + 1) and {1,1 + 3) elements are much smaller 
than {1,1 + 2), {1,1 + 4) due to the near north-south symmetry 
of the foreground mask in galactic coordinates. The thin black 
solid lines in the bottom half show an analytic approximation 
for the corresponding parts of the correlation matrix jlSj. 



over a few thousand simulations, but far more simula- 
tions would be needed to correct small inaccuracies in 
the off-diagonal terms. 

A disadvantage of the shrinkage approach is that it 
does not specify how parameters should be scaled or di- 
vided into subsets with different shrinkage intensities. 
For example it may also be beneficial to divide in ranges 
of I if the accuracy of the target varies as a function of 
I. This is the case for many power spectrum estima- 
tors since the covariance can be calculated accurately in 
the noise-dominated regime, with the main inaccuracies 
coming from the sample variance over the acoustic peaks. 
The extent to which covariance errors affect parameters 
also varies as a function of I: errors in the noise domi- 
nated regime have little effect because there is no signal 
there. 
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FIG. 6: The shrinkage intensity A* against the number of 
WMAP 5-year simulation samples used, using the analytic 
approximation as the target (top) and using a target that 
is significantly wrong (bottom; the analytic approximation 
multiplied by 1.3). The shrinkage intensity is calculated sep- 
arately for each strip of the normalized matrix, the plots show 
the diagonal of the covariance (solid), first off-diagonal strip 
(short-dashed) and the second strip (long-dashed). 



B. Covariance models 



The general shrinkage estimator uses a target covari- 
ance and combines it directly with data from simulations, 
allowing general deviations from the target to be deter- 
mined with enough simulations. However in many in- 
stances we actually have rather strong priors about the 
form of the covariance, even without an accurate prior; 



for example in the case of the CMB the covariance is 
expected to be strongly diagonally dominated, and also 
smooth in I unless sharp changes are introduced by the 
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FIG. 7; The fractional difference of various approximations 
to tlie smootlied diagonal of the WMAP5 Ci covariance esti- 
mated from 23000 samples from Ref . . The thick solid line 
shows an analytic approximation, which is accurate at the 5%- 
level. Thin lines show the result from fitting a cubic spline 
to the Ci variance using 100 (dot-dashed) and 1000 (solid) 
simulation samples. Spline nodes are separated by Ai — 50. 



the simulation data using cubic splines. Fig. ^ shows that 
fitting only 100 simulation samples using cubic-splines for 
the diagonal can be as accurate as the analytic approxi- 
mation. With a thousand or more simulations the covari- 
ance diagonal can be determined to the sub-percent level. 
If required additional parameters could be introduced to 
model variations in the off-diagonal components. If the 
theoretical model dependence of the covariance is not well 
captured by the approximation of Ref. ^ , fits could also 
be made as a function of simulation parameters and then 
interpolated as required. 

An alternative to fitting a model for the covariance 
when no good model is available would be using a 
smoothed version of the estimated covariance, for exam- 
ple by applying a Gaussian smoothing kernel to each di- 
agonal strip of the matrix. If the true covariance is indeed 
smooth, this will significantly reduce the sampling noise 
and hence improve the covariance estimate. For example 
applying a smoothing kernel of width cr; = 5 to the di- 
agonal strips produces results for the likelihood in Fig. ^ 
that are similar to the shrinkage estimator, depending on 
the realization. However it does not guarantee that the 
smoothed matrix is invertible. 



IV. COMBINING LOW AND HIGH I 



choice of C;-estimator^ (e.g. see Fig. ||). If we are confi- 
dent that this structure is correct, we should be able to 
use it to improve our covariance estimator. 

The approach adopted by the WMAP team is to use 
an analytic model of the covariance and then calibrate 
it from simulations by fitting parameters to a model of 
the covariance diagonal [Q. If the model has the right 
shape this can reduce to fitting a very small number of 
parameters, which can be done accurately from a rela- 
tively small number of simulations, making this a very 
good approach. On the other hand if the actual be- 
haviour is more complicated than the assumed model, 
or the off-diagonal correlations cannot be calculated reli- 
ably analytically, this could lead to misestimation of the 
covariance. 

For analysis methods that do not introduce features 
in I, variations in the Ci covariance diagonal are smooth 
on a scale determined by the scale of the acoustic peaks. 
Re-scaling the diagonal of the covariance using a smooth 
model, and taking the correlation matrix from the ana- 
lytic result, therefore gives close to the right answer; e.g. 
using the estimator 

M/^r'i^i - piTu'Pi' (19) 
(no sum), where T is the analytic model and pi are fit to 



^ Note the WMAP team's analysis does use a sharp cut in I to 
switch between pseudo-C; weighting schemes. However each es- 
timator is separately smooth. 



We have discussed how the likelihood can easily be cal- 
culated at high I without losing very much information 
by compressing the data into a set of pseudo-C; estima- 
tors. However at low I one should be able to perform a 
more optimal analysis, which is desirable since the exact 
shape of the likelihood function depends on the partic- 
ular realization of low / modes on the sky. The WMAP 
data analysis uses a pixel based likelihood at low I [p7| , 
essentially assuming Gaussianity of a set of low resolution 
pixel values and using the pixel covariance to calculate 
the likelihood. An alternative would be to work directly 
in harmonic space, avoiding pixelization issues, as dis- 
cussed in Appendix However one calculates the low I 
likelihood, the question remains of how to combine this 
with a high-^ likelihood, given it will not be possible to 
do an exact I separation on the cut sky. 

Given a set of observed modes on the sky X, a low I 
likelihood operates on some subset of the modes given by 
Xg = Ai'X, where M is rectangular and X^ is sufficiently 
small that an exact likelihood calculation is numerically 
feasible. In general X^ will contain the low I modes we 
are aiming to analyse exactly (Z < Zoxact), along with 
some leakage from higher I > Zoxact due to the sky cut. 
It follows that in the fiducial model the remaining infor- 
mation is contained in the independent high-Z-dominated 
modes 

X> = (/- (XXt)M^[M(XXt)Mt]-i^^ X 

= X-(XXt)Aft[(x,xt)]-iX, 

w X - (XXt)Mtx,, (20) 
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FIG. 8: Input temperature map with kp2 mask (top), 
low-Z modes (XXt)MtXs with /exact = 20, /low = 80 (middle, 
see Appendix H), and the low-Z modes' contribution from / > 
'exact (bottom)! Note how the supported modes are going 
smoothly to zero at the cut edges, and the mixing from to 
I > 'exact is mostly near the edges of the cut. Colour scales 
are not the same. 




FIG. 9: Decomposition of the same map into low-' modes 
(top), independent high-' modes (middle), and the contri- 
bution from ' > 'exact to the low-' map. Here 'exact = 30, 
'low ~ 120. Note a slightly larger mask would remove much 
of the high-' contribution. 



where the third line follows for the temperature in the 
true model if the modes have been decorrelated and nor- 
malized to be white. For temperature and polarization 
the same construction can be used on the large vector 
^TEB in Eq- (|20|), though there are simplifying advan- 
tages to doing temperature and polarization separately. 
In a different model these modes should also be approx- 
imately independent as long as the fiducial model is rea- 
sonably accurate. 

From the high-Z modes X> , one can construct pseudo- 
Ci (or more optimal) estimators as normal, however the 
coupling matrix and covariance are more complicated be- 
cause the weight function is no longer local in pixel space. 
If the leakage from low to high ' is small, these should 



have roughly the same properties as the normal cut-sky 
estimators, and hence C;-estimator likelihood approxi- 
mations can be used. If the leakage is small we could 
also just ignore the correlation, which is probably fine as 
long as nothing interesting is happening over the affected 
' range. 

However as shown in Figs. ^ and ^ the mixing of 
modes to higher ' dominates around the cut edges, where 
higher-/ modes are needed so that the projected map 
goes smoothly to zero on the cut. This suggests that 
by slightly enlarging or apodizing the mask used for the 
I ^ 'exact pseudo-C/ analysis, most of the modes already 
included in the low-' likelihood would be removed. This 
enlarged cut would only be needed for some number of 
' ^ 'exact where the leakage is significant; at very high ' 
the leakage is negligible so the original cut can be used. 
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FIG. 10: Pseudo-C; temperature power spectra of a simu- 
lated map projected into low-Z modes (Zoxact = 30, /low ~ 120, 
e = 0.001, see the Appendix). Dashed shows the result using 
the 'kp2' mask (85% of the sky) used to construct the modes. 
The thin red line shows the estimated Cis using the enlarged 
'QK75' temperature mask that includes only 72% of the 
sky. The leakage to / > ioxact is relatively small, but is further 
decreased by using the larger mask. The thick line shows the 
power spectrum from the cut sky without projecting out the 
\ow-l modes. 



This is a quick and simple solution to separating the 
scales to a reasonable approximation: use an accurate 
likelihood at low I (including some modes from higher I), 
use pseudo-C; estimators with an enlarged mask over in- 
termediate scales where the leakage is important, then at 
high I use pseudo-C/ estimators with the original mask. 

Fig. |l^ shows that the leakage between scales is small 
but can further be reduced by using a larger mask over 
a range ~ 10 above Zoxact- Using an enlarged mask is 
suboptimal because some modes in the enlarged cut are 
being lost: the high-Z modes included in the low-l likeli- 
hood are largely aligned with the boundary, so by cutting 
all modes in the boundary region transverse modes are 
lost that could be included. However a small subopti- 
mality is not likely to matter if there is little interest- 
ing information in the range I > ^oxact- The separation 
method has the advantage of keeping the low and high- 
l analyses straightforward while including almost all of 
the valuable information at low I. The harmonic method 
described in the appendix is also free from pixelization er- 
rors and allows a high-resolution mask to be used: there 
is no additional loss of information from having to use an 
enlarged mask and big pixels, as is often the case when 
using smoothed pixel-based likelihoods. Simply ignoring 
the correlation between low and high-Z would be a good 
approximation in most models, but is formally incorrect 
due to double-counting of information in the overlap re- 
gion. 

A recent paper [p9| has claimed that for accurate like- 



lihood calculations on our observed CMB sky, an nearly- 
exact likelihood is required up to Z ~ 100. Although this 
is somewhat surprising given that pseudo-Q methods 
are unbiased and not far from optimal, it would still be 
tractable by direct as well as Gibbs-sampling approaches. 



CONCLUSIONS 



We have seen that approximate fast power-spectrum 
likelihoods work well on small scales even with strongly 
anisotropic noise. The error bars are slightly increased 
by using a suboptimal method, but since the effect on 
parameters is small, this is likely to be a price well worth 
paying for robustness. 

The covariance that is required to calculate the high-Z 
likelihood can be calculated from simulations, but in gen- 
eral a very large number of simulations are required to 
calculate the likelihood accurately. Using approximate 
analytical results and expected smoothness properties, 
a more accurate covariance can be calculated with far 
fewer samples by calibration from simulations. There is 
a trade off between being able to see any strange be- 
haviour in the simulations and using prior expectations 
to reduce the number of simulations required. Shrinkage 
estimators provide one way of combining analytic and 
simulation results so that if the analytic result is signif- 
icantly wrong the estimator becomes dominated by the 
simulation result. Shrinkage estimators however require 
far more samples than a model-fitting approach, which 
can obtain very accurate results from less than a thou- 
sand simulations. 

With an accurate covariance and set of power spec- 
trum estimators, the high-Z likelihood can be calculated 
quickly and easily. We showed how this could be com- 
bined with a more optimal low-Z likelihood function, com- 
pensating for the small leakage between high and low I 
while including almost all of the useful information from 
low I. We conclude that a combined likelihood function 
should be a good option for analysing foreground-cleaned 
Planck data, though further work is required to model 
the foreground uncertainties accurately. Fast and robust 
methods based on power spectra at high I and harmonic 
near-exact likelihoods at low I are likely to be a very good 
alternative to globally more accurate (but numerically 
more difficult) Gibbs sampling methods, as concluded by 
other authors |ri|, ^ . In the appendix we described a 
practical harmonic method for calculating the low-Z like- 
lihood without introducing pixelization issues, which can 
be used easily once the noise covariance is calculated in 
harmonic space. Allowing for marginalization over fore- 
ground templates, or similar method, can often take place 
in any basis, so generalizing the method suggested here 
to more realistic cases could be straightforward (see e.g. 
Ref. M). 
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APPENDIX A: COVARIANCE PRIORS 

The maximum likelihood estimator of the covariance 
of a set of zero-mean samples of a Gaussian vector n is 



r? ^ — ^ 



(Al) 



A p-dimensional symmetric positive definite matrix has 
the Wishart distribution (see e.g. Ref. [^) S ^ 
W^(n,C) if 

P^S) = ^e-lTr(sc-) (A2) 

so the covariance estimator is distributed as nC ^ 
Wp(n, C). Given the estimator, the true covariance with 
a flat prior has a distribution C ^ IWp{n, nC) given by 



P{C) 



|5'|(n-p-l)/2 



25("-p-i)prp(i(n-p- l))|C|"/2 



,-iTr(SC-i) 



(A3) 



where n > 2p. 

Assume we wish to calculate the likelihood of d, 
where d has a p-dimensional Normal distribution d ^ 
Np(0, C). Using the estimator of the covariance we want 
to marginalize out the uncertainty in the true covariance 
to give 



L{d\C) = / dC 



e 2 



(27r)p/2|C|i/2- 

|„^|(«-p-l)/2 



-e 2 



f Tr(CC"i) 



(A4) 



2i(n-p-i)p rp(i(n -p- l))|C|"/2 
The integral is just another inverse Wishart distribution 
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and can be done, so that 
- 21ogL 



n — p) log \nC + dd 
1 



= (n-p)log( 1 + ^d^C-M 



n — p 



n 



n 2 



X 



(A5) 
.(A6) 

.(A7) 



(ignoring d- independent terms) and assuming C^^ is in- 
vertible {n > p). Here (^ 0{p) for typical data) 
is the naive chi-squared one would calculate taking the 
covariance estimated from simulations to be the true co- 
variance. We are most interested in how the likelihood 
varies with parameters, so taking the derivative with re- 
spect to a parameter 9 gives 



-29 log i n — p dx^ 



do 



89 



(A8) 



Thus using the likelihood e^*^/^ gives best fit values that 
agree with the marginalized result, but we need n ^ p 
samples for the error bars to be similar. If this is not the 
case then using will underestimate the error bars by 
a fraction 0{[n — p]/[n + p])] for a fractional accuracy on 
the error bars better than a, we need n 3> 2p/a. For the 
numerical value of the log likelihood to be close we would 
also need p^. 

One approach to correct for the underestimated error- 
bars would be to multiply the log-likelihood by an esti- 
mate of the under-estimation factor. This should at least 
ensure the error-bars are consistent, though they would 
then be larger than could be obtained by using more sim- 
ulations to estimate the covariance. 

If instead of taking a flat prior for C we assume an 
inverse- Wishart prior IWp{nr,Cr), the result above is 
the same with 

(A9) 



n + Ur 

and n n+rij., which is essentially a shrinkage estimator 
of the covariance. A hierarchical Bayesian model might 
take a flat or log prior on n^, or often it is fixed at its 
minimum value for normalizability. 



APPENDIX B: 



CUT-SKY HARMONIC LOW-l 
LIKELIHOOD 



Here we present a harmonic near-exact low I likelihood 
method that has no artefacts from using a low-resolution 
pixclization, and does not rely on a map-smoothing pro- 
cedure that potentially mixes information inside and 
outside the foreground mask. The method must work 
with strongly anisotropic cusped noise as expected with 
Planck - but this is no problem in harmonic space. The 
main disadvantage is that it may be slower than pixel- 
based codes. For this discussion we neglect all non-ideal 
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complications (foregrounds etc), and assume that if the 
noise is correlated, the noise covariance can be calcu- 
lated in harmonic space. The method can be applied 
to harmonic coefficients computed from high-resolution 
pixelized maps, or directly to the output of a harmonic 
map-making algorithm (e.g. Ref. |]33| ), thereby avoiding 
pixelization issues at all stages. 

Cut-sky pseudo harmonics X are given in terms of the 
underlying full-sky harmonics X and a noise vector n by 

X = W°"X + h (Bl) 

where is the coupling matrix, which in general is 

an n X oo matrix where we consider X only up to the 
first n modes (ordered in I up to how)- Elements of the 
coupling matrix can be evaluated easily numerically up 
to moderate I in term of the harmonic coefficients of the 
window (foreground and point-source mask) Wim using 

(-1)- E ^'"V^'^^^'^^'t^'^^x 

fl h h\(l h h\ 

\0 Oj\rn -mi 7712) ^ ' 

where m = rrii — m2. 

We wish to model the covariance using only modes up 
to I = /low, and aim to compute the exact likelihood for 
I < /exact (where /exact < /low)- Since W°° is an n x 00 
matrix, the covariance of X depends on all /, so we need 
to project out dependence on / > how We define W 
to be the n x n Hermitian matrix that we can easily 
compute, so that W°° = (W, 1^) where Y is nx 00. For 
window functions that are or 1 everywhere W°° is a di- 
agonal projection matrix in pixel space, and in harmonic 
space completeness of the spherical harmonics then im- 
plies W°°W^°°t ^ W (hence YY'f = W-W'^). Eigen- 
modes e oi W, with We — Ae, then have eigenvalues 
< A < 1. Modes with A ~ have no signal variance 
and hence can be deleted without loss of information. 
The remaining modes that we want have e^y ~ 0, so 
that dependence on high / is removed. Since 

\e^Y\^ =e^W -W^)e^ \), (B3) 

modes with A ^ 1 will have the required properties: we 
just want the well supported modes (c.f. Ref. |Q). Us- 
ing sufficiently large how > /exact allows us to construct 
well supported modes that contain almost all of the infor- 
mation at / < /exact (but only incomplete information at 
/exact < / < how)- As /low ^ 00 we have W, and 

hence A {0, 1}, so for large enough how the supported 
modes are guaranteed to contain all of the information. 
Whether or not it works in practice depends on how high 
how needs to be to obtain reliable nearly-optimal results 

at / ^ /exact ■ 

We therefore define the cut-sky supported modes 

Xc = -D"^/2C/tX==£)i/2c/tX + nc, (B4) 



where W = UDU\ U is orthogonal, and rows of the 
diagonal matrix D corresponding to not well supported 
modes {Da < 1 — ei) are deleted to form D and {7; 
ei is a free parameter that determines the tolerance for 
un-modelled mixing of modes with / > how and ric is 
noise. Equalities here and below are taken to hold to 
order ei. Note that the uncorrelated noise covariance 
can be calculated including noise power from all /, so 
ei is only determining the signal leakage. For isotropic 
white noise (XcXj)^ = cr^J, and for low noise levels 
it may be a good idea to add a small amount of fake 
isotropic noise so that at / > how the noise is not 
negligible and theory leakage becomes small compared 
to the noise on these scales. Also note that we only need 
to compute the eigenvectors corresponding to small ei. 
In practice ei need not be terribly small since for Ziow 
significantly bigger than /exact the power from / > /exact 
is suppressed when we focus on modes with interesting 
signal from / < /exact- 

Neglecting the small coupling from / > how, the full 
covariance of the well-supported modes can be written 
(and then Cholesky decomposed) as 

(X^Xt) =S+ + Ns + Soi + Nc + all = LL^. (B5) 

Assuming pixel noise is uncorrelated, for the 
temperature the noise variance is given by 
Ni,„,i,„,, = J2^^nlw{syc7^s)Yi„,{s)Yi,,„,{s)* and 
Nc = D^'^/'^ij^NTJD-'^/'^ (the generalization to corre- 
lated noise is trivial if TV can be calculated). We split 
S into parts, the bit we want from 2 < I < /exact (5'+), 
contributions from other /exact < / < how i^s) that can 
be though of as part of the noise on the low / modes, 
and Sqi which is the contribution from any X monopole 
and dipole (set to be a large number so that the next 
step projects it out). Then 

Xl = L-^X, (B6) 

are uncorrelated with unit variance (in the assumed 
model), and signal covariance from the target / range 
is 

{XlXI)s = L-'S+iL^)-^ = UsDsUl (B7) 

Since the X^ have variance 1 in total, modes correspond- 
ing to small Ds will be 'noise' dominated. We therefore 
define the signal to signal plus noise eigenmodes by keep- 
ing only modes with [-Ds]ii > e for some small e (for 
other discussions of similar procedures see Ref. and 
the WMAP hkelihood code^). These new modes contain 
the interesting signal 

Xs = uIXl = UlL-^X,. (B8) 



. tittp: //lambda, gsfc.nasa. go v/product/map/dr2/likelihooc _ 
f aster_v2p2p2/™ap_f asttt .pd 
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FIG. 11: Pseudo-C; temperature power spectra of a simu- 
lated map ('kp2' mask) with the low-/ modes (/exact = 30) 
projected out using a signal/noise cut e — 0.5 (blue, marked 
points) and e = 0.1 (red, thin line) for /ic,„ — 60. The thick 
line is the power spectrum without projection. This shows the 
trade-off between including all the low-Z power and increas- 
ing leakage to higher I. The leakage between scales is smaller 
with higher Ziow so that more supported modes containing 
low-/ signal can be extracted. 



There are typically ~ (^cxact + 1)^ - 'min of these per 
field, corresponding to the number of Xim for /,nin < I < 
^cxact- However a significant fraction of the modes have 
important power from / > /exact due to the sky cut. See 
Fig. |ll| for an example of how the leakage between scales 
depends on the parameters. 

Define M, M as the coupling matrices to the under- 
lying true and pseudo harmonics (at / < /low), so that 



X, = MX + n, = U^L-^D^/^U^X - 



(B9) 
(BIO) 



Note that M should project out any underlying 
monopole and dipole. The covariance in the case of the 
temperature is then given by 



where 



(X,,X,,^) = AT, + Mdiag(Cf ^)M^ 



AT, = MNMK 



(Bll) 



(B12) 



If desired, and TVs is sufficiently non-singular, we can 
then write Ng = LsL\ and define the modes ij^Xs 
which have unit white noise (in general we can diago- 
nalize even if the matrix is nearly singular). Note that 
although the signal variance is dominated by modes with 
2 < I < /exact, M couples in power up to /low 
For the polarization we have 



E = W+E + iW-B 
B = W+B iW^F,, 



(B13) 
(B14) 



where 



(2/i + l)(2/2 + l)(2/ + l) 



I h h 

2-: 



± 



I h h 

0-2 2 



47r 
/ 

TO 



-TOl 7712 



(B15) 



and TO = 7711 — • Before continuing we change to using 
real rather than complex modes, as follows. 



1. Real harmonics 

It is convenient for numerical work to use real harmon- 
ics For the T, E, and B we define the real harmonic 
coefficients 



X 



X, 



XiQ — Xia, 



(B16) 



(1 771 1 > 0), which in the full sky case are uncorrelated 
and have variance C^^ . The real coupling matrices then 
relate the pseudo and true harmonics 



E^' - 



w^b'^ 



(B17) 
(B18) 
(B19) 



where is symmetric and W:^ is antisymmetric. The 
real matrices are related to the complex ones by 



*^l\m\l'\m'\ 



w, 



R 



l—\m\l'\m' 
R 



5R(w^/i™i/'i™'i + (-irv,|„|,._i„,,| 



*^l~\m\l'~\m'\ 



= 3 -w, 



l\m\l' \m' I 



+ (-1)" Wi 



l\m\l' — \m' I 



3fi W, 



/|m|/' \m' I 



(-1)™ Wi 



/ |m|/' — |m' I 



\m\l'0) 



w, 



■ \1'0 



lOl'O 



(B20) 



where |to|, |7n'| > and W can be replaced by W+ or 
iW^ to obtain the equivalent results for W^. The noise 
on the real harmonics (for equal and uncorrelated noise 
on Q and U) is given by 



(E«(E^)^) 
?R/T}R\T\ 



N 



(B«(B«)^)jv = W^^ 



{■E^iB^Y)N = -{B^iE^f)N 



(B21) 
W'^'^ (B22) 



where W_ is evaluated with window function wn{s) 
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2. Polarization modes 

It is convenient to re-complexify the polarization anal- 
ysis by defining P = + so that 

P = (Vr^ - iW^)P = WpP (B23) 

where Wp is Hermitian. The mode construction there- 
fore goes through exactly as for the temperature, except 
now all matrices are complex, with (P*P^^) ^ 0. The 
noise covariance under stated assumptions is given by 

{PP^)n = 2Wiy (P*P^)Ar = 0. (B24) 

The real set of modes we end up with is then X^bs = 
{Tf , 3fJ(Ps), 3(Ps)}, which includes the E/B mixed 
modes. Nearly-pure E and B would be obtained by keep- 
ing only the well supported modes of Wj,- rather than the 
well supported modes of Wp (see Ref. Q ) . To calculate 
the signal to noise eigenmodes we can, for example, take 
Cf ^ = Cf^, where Cf ^ is a high optical depth model, 
to ensure that no potentially interesting modes are lost. 

3. Implementation 



Since information at I > 4xact is subdominant, if Zexact 
is chosen so that in all models of interest the Ci are of 
well-determined shape at I > /exact, it may be possible 
to pre-compute the most time-consuming contribution 
to the covariance from /exact < / < how, and simply scale 
it by some weighted average of the spectrum over that 
/-range. Certainly for /exact ~ 30 all the spectra are ex- 
pected to be very smooth up to /low ~ 100 and this should 
work well. It could also be fixed at some fiducial model, 
but there is then a danger of biasing the likelihood from 
/ < /exact by misestimating the contribution to the vari- 
ance from higher /. 



Using /exact = 30, /low = 100, 6^ = 0.01, e = 10-3 
seems to work well (for Planck a significantly larger ei can 
be used for the temperature since the noise is very small). 
For an optimal tensor mode analysis we may want to 
push to /exact ~ 150 to get all the i?-mode power, which 
is just about numerically tractable. The low-/ harmonic 
likelihood described here has been implemented in the 
CosmoMC^ package for parameter estimation since the 
February 2008 version. 



There is some freedom in how Eq. ( Bll ) is calculated. 
Ns can be pre-computed assuming we know the noise 
model and are not fitting it from the data. The cou- 
pling matrix M has size ©(/exact^) x C(/iow^)- If there 
is plenty of memory, 0(/iow) matrices J2m ^iiim)M*^im) 
can be pre-computed for each /, so that calculating the 
covariance is quick, but taking up ©(/oxact^/iow) mem- 
ory. Calculating the likelihood is then dominated by 
the cost of Cholesky decomposition, ©(/exact^), which is 
quite fast for /exact ^ 30. Alternatively the covariance 
can be calculated on the fly at a dominating computa- 
tional cost of C'(/oxact''/iow^) (and only storing M of size 



ittp : //cosmologist . inf o/cosmomc/ 



It is possible that the low-/ likelihood can be approx- 
imated very accurately for very fast subsequent evalua- 
tion. For example the likelihood approximation of Ref. 
could be applied to maximum-likelihood power spectrum 
estimators, or parameters in a likelihood model could be 
fit to accurately reproduce a full calculation |Q . In this 
case a near-exact low-/ method would be an important 
step for testing or calibrating the approximation, and 
any speed hit of a harmonic-space approach would be 
much less important compared to possible accuracy ad- 
vantages over a pixel-based method. Another possible 
fast approximation could come from fitting Gibbs sam- 
pling results |23]. 
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